One Day in Twitter: Topic Detection Via Joint Complexity
نویسندگان
چکیده
In this paper we introduce a novel method to perform topic detection in Twitter based on the recent and novel technique of Joint Complexity. Instead of relying on words as most other existing methods which use bag-ofwords or n-gram techniques, Joint Complexity relies on String Complexity which is defined as the cardinality of a set of all distinct factors, subsequences of characters, of a given string. Each short sequence of text is decomposed in linear time into a memory efficient structure called Suffix Tree and by overlapping two trees, in linear or sublinear average time, we obtain the Joint Complexity defined as the cardinality of factors that are common in both trees. The method has been extensively tested for Markov sources of any order for a finite alphabet and gave good approximation for text generation and language discrimination. One key take-away from this approach is that it is language-agnostic since we can detect similarities between two texts in any loosely character-based language. Therefore, there is no need to build any specific dictionary or stemming method. The proposed method can also be used to capture a change of topic within a conversation, as well as the style of a specific writer in a text. In this paper we exploit a dataset collected by using the Twitter streaming API for one full day, and we extract a significant number of topics for every timeslot. Copyright c © by the paper’s authors. Copying permitted only for private and academic purposes. In: S. Papadopoulos, D. Corney, L. Aiello (eds.): Proceedings of the SNOW 2014 Data Challenge, Seoul, Korea, 08-04-2014, published at http://ceur-ws.org
منابع مشابه
Real Time Event Detection in Twitter
Event detection has been an important task for a long time. When it comes to Twitter, new problems are presented. Twitter data is a huge temporal data flow with much noise and various kinds of topics. Traditional sophisticated methods with a high computational complexity aren’t designed to handle such data flow efficiently. In this paper, we propose a mixture Gaussian model for bursty word extr...
متن کاملDetection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets
Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...
متن کاملDetection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets
Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...
متن کاملExemplar-Based Topic Detection in Twitter Streams
Detecting topics in Twitter streams has been gaining an increasing amount of attention. It can be of great support for communities struck by natural disasters, and could assist companies and political parties understand users’ opinions and needs. Traditional approaches for topic detection focus on representing topics using terms, are negatively affected by length limitation and the lack of cont...
متن کاملTwitter event detection: combining wavelet analysis and topic inference summarization
Today streaming text mining plays an important role within real-time social media mining. Given the amount and cadence of the data generated by those platforms, classical text mining techniques are not suitable to deal with such new mining challenges. Event detection is no exception, available algorithms rely on text mining techniques applied to pre-known datasets processed with no restrictions...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014